EDA Kecelakaan Pesawat
¶

Step 1. Data Acquistion¶

In [1]:
# import semua package / module yang diperlukan

import pandas as pd
import datetime as dt
import plotly.express as px
import plotly.graph_objects as go
In [2]:
df = pd.read_csv('Airplane_Crashes_and_Fatalities_Since_1908.csv')

df.head()
Out[2]:
Date Time Location Operator Flight # Route Type Registration cn/In Aboard Fatalities Ground Summary
0 09/17/1908 17:18 Fort Myer, Virginia Military - U.S. Army NaN Demonstration Wright Flyer III NaN 1 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly...
1 07/12/1912 06:30 AtlantiCity, New Jersey Military - U.S. Navy NaN Test flight Dirigible NaN NaN 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh...
2 08/06/1913 NaN Victoria, British Columbia, Canada Private - NaN Curtiss seaplane NaN NaN 1.0 1.0 0.0 The first fatal airplane accident in Canada oc...
3 09/09/1913 18:30 Over the North Sea Military - German Navy NaN NaN Zeppelin L-1 (airship) NaN NaN 20.0 14.0 0.0 The airship flew into a thunderstorm and encou...
4 10/17/1913 10:30 Near Johannisthal, Germany Military - German Navy NaN NaN Zeppelin L-2 (airship) NaN NaN 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked...
In [3]:
# melihat isi tiap kolom dari dataset (menentukan perlu cleaning bagian mana)

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5268 entries, 0 to 5267
Data columns (total 13 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Date          5268 non-null   object 
 1   Time          3049 non-null   object 
 2   Location      5248 non-null   object 
 3   Operator      5250 non-null   object 
 4   Flight #      1069 non-null   object 
 5   Route         3561 non-null   object 
 6   Type          5241 non-null   object 
 7   Registration  4933 non-null   object 
 8   cn/In         4040 non-null   object 
 9   Aboard        5246 non-null   float64
 10  Fatalities    5256 non-null   float64
 11  Ground        5246 non-null   float64
 12  Summary       4878 non-null   object 
dtypes: float64(3), object(10)
memory usage: 535.2+ KB
In [4]:
df.describe()
Out[4]:
Aboard Fatalities Ground
count 5246.000000 5256.000000 5246.000000
mean 27.554518 20.068303 1.608845
std 43.076711 33.199952 53.987827
min 0.000000 0.000000 0.000000
25% 5.000000 3.000000 0.000000
50% 13.000000 9.000000 0.000000
75% 30.000000 23.000000 0.000000
max 644.000000 583.000000 2750.000000

Step 2. Data Wrangling / Cleaning¶

In [5]:
# 1. Mengubah tipe data date

print(type(df['Date'][0])) # Data date berupa string, alangkah baiknya kita ubah jadi datetime
<class 'str'>
In [6]:
df['Date'] = pd.to_datetime(df['Date'])

df.head()
Out[6]:
Date Time Location Operator Flight # Route Type Registration cn/In Aboard Fatalities Ground Summary
0 1908-09-17 17:18 Fort Myer, Virginia Military - U.S. Army NaN Demonstration Wright Flyer III NaN 1 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly...
1 1912-07-12 06:30 AtlantiCity, New Jersey Military - U.S. Navy NaN Test flight Dirigible NaN NaN 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh...
2 1913-08-06 NaN Victoria, British Columbia, Canada Private - NaN Curtiss seaplane NaN NaN 1.0 1.0 0.0 The first fatal airplane accident in Canada oc...
3 1913-09-09 18:30 Over the North Sea Military - German Navy NaN NaN Zeppelin L-1 (airship) NaN NaN 20.0 14.0 0.0 The airship flew into a thunderstorm and encou...
4 1913-10-17 10:30 Near Johannisthal, Germany Military - German Navy NaN NaN Zeppelin L-2 (airship) NaN NaN 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked...
In [7]:
# 2. Extract Tahun nya saja dari date

df['Year'] = pd.DatetimeIndex(df['Date']).year

df.head()
Out[7]:
Date Time Location Operator Flight # Route Type Registration cn/In Aboard Fatalities Ground Summary Year
0 1908-09-17 17:18 Fort Myer, Virginia Military - U.S. Army NaN Demonstration Wright Flyer III NaN 1 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly... 1908
1 1912-07-12 06:30 AtlantiCity, New Jersey Military - U.S. Navy NaN Test flight Dirigible NaN NaN 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh... 1912
2 1913-08-06 NaN Victoria, British Columbia, Canada Private - NaN Curtiss seaplane NaN NaN 1.0 1.0 0.0 The first fatal airplane accident in Canada oc... 1913
3 1913-09-09 18:30 Over the North Sea Military - German Navy NaN NaN Zeppelin L-1 (airship) NaN NaN 20.0 14.0 0.0 The airship flew into a thunderstorm and encou... 1913
4 1913-10-17 10:30 Near Johannisthal, Germany Military - German Navy NaN NaN Zeppelin L-2 (airship) NaN NaN 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked... 1913
In [8]:
# 3. Amputasi data aboard yang NaN
df.dropna(subset=['Aboard'], inplace = True)
df.head()
Out[8]:
Date Time Location Operator Flight # Route Type Registration cn/In Aboard Fatalities Ground Summary Year
0 1908-09-17 17:18 Fort Myer, Virginia Military - U.S. Army NaN Demonstration Wright Flyer III NaN 1 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly... 1908
1 1912-07-12 06:30 AtlantiCity, New Jersey Military - U.S. Navy NaN Test flight Dirigible NaN NaN 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh... 1912
2 1913-08-06 NaN Victoria, British Columbia, Canada Private - NaN Curtiss seaplane NaN NaN 1.0 1.0 0.0 The first fatal airplane accident in Canada oc... 1913
3 1913-09-09 18:30 Over the North Sea Military - German Navy NaN NaN Zeppelin L-1 (airship) NaN NaN 20.0 14.0 0.0 The airship flew into a thunderstorm and encou... 1913
4 1913-10-17 10:30 Near Johannisthal, Germany Military - German Navy NaN NaN Zeppelin L-2 (airship) NaN NaN 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked... 1913
In [9]:
# 4. Imputasi data ground

df['Ground'] = df['Ground'].fillna(0)
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 5246 entries, 0 to 5267
Data columns (total 14 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   Date          5246 non-null   datetime64[ns]
 1   Time          3049 non-null   object        
 2   Location      5227 non-null   object        
 3   Operator      5228 non-null   object        
 4   Flight #      1067 non-null   object        
 5   Route         3556 non-null   object        
 6   Type          5221 non-null   object        
 7   Registration  4914 non-null   object        
 8   cn/In         4029 non-null   object        
 9   Aboard        5246 non-null   float64       
 10  Fatalities    5246 non-null   float64       
 11  Ground        5246 non-null   float64       
 12  Summary       4863 non-null   object        
 13  Year          5246 non-null   int32         
dtypes: datetime64[ns](1), float64(3), int32(1), object(9)
memory usage: 594.3+ KB
In [10]:
# 5. Amputasi data NaN yang tersisa
# a. Kolom / Column / Series

del df['Time']
del df['Flight #']
del df['Route']
del df['Registration']
del df['cn/In']

df
Out[10]:
Date Location Operator Type Aboard Fatalities Ground Summary Year
0 1908-09-17 Fort Myer, Virginia Military - U.S. Army Wright Flyer III 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly... 1908
1 1912-07-12 AtlantiCity, New Jersey Military - U.S. Navy Dirigible 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh... 1912
2 1913-08-06 Victoria, British Columbia, Canada Private Curtiss seaplane 1.0 1.0 0.0 The first fatal airplane accident in Canada oc... 1913
3 1913-09-09 Over the North Sea Military - German Navy Zeppelin L-1 (airship) 20.0 14.0 0.0 The airship flew into a thunderstorm and encou... 1913
4 1913-10-17 Near Johannisthal, Germany Military - German Navy Zeppelin L-2 (airship) 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked... 1913
... ... ... ... ... ... ... ... ... ...
5263 2009-05-20 Near Madiun, Indonesia Military - Indonesian Air Force Lockheed C-130 Hercules 112.0 98.0 2.0 While on approach, the military transport cras... 2009
5264 2009-05-26 Near Isiro, DemocratiRepubliCongo Service Air Antonov An-26 4.0 4.0 0.0 The cargo plane crashed while on approach to I... 2009
5265 2009-06-01 AtlantiOcean, 570 miles northeast of Natal, Br... Air France Airbus A330-203 228.0 228.0 0.0 The Airbus went missing over the AtlantiOcean ... 2009
5266 2009-06-07 Near Port Hope Simpson, Newfoundland, Canada Strait Air Britten-Norman BN-2A-27 Islander 1.0 1.0 0.0 The air ambulance crashed into hills while att... 2009
5267 2009-06-08 State of Arunachal Pradesh, India Military - Indian Air Force Antonov An-32 13.0 13.0 0.0 The military transport went missing while en r... 2009

5246 rows × 9 columns

In [11]:
# b. baris / row / index

df.dropna(subset=['Location','Operator','Type'], inplace=True)
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 5191 entries, 0 to 5267
Data columns (total 9 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        5191 non-null   datetime64[ns]
 1   Location    5191 non-null   object        
 2   Operator    5191 non-null   object        
 3   Type        5191 non-null   object        
 4   Aboard      5191 non-null   float64       
 5   Fatalities  5191 non-null   float64       
 6   Ground      5191 non-null   float64       
 7   Summary     4823 non-null   object        
 8   Year        5191 non-null   int32         
dtypes: datetime64[ns](1), float64(3), int32(1), object(4)
memory usage: 385.3+ KB
In [12]:
df = df.reset_index()

del df['index']
df
Out[12]:
Date Location Operator Type Aboard Fatalities Ground Summary Year
0 1908-09-17 Fort Myer, Virginia Military - U.S. Army Wright Flyer III 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly... 1908
1 1912-07-12 AtlantiCity, New Jersey Military - U.S. Navy Dirigible 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh... 1912
2 1913-08-06 Victoria, British Columbia, Canada Private Curtiss seaplane 1.0 1.0 0.0 The first fatal airplane accident in Canada oc... 1913
3 1913-09-09 Over the North Sea Military - German Navy Zeppelin L-1 (airship) 20.0 14.0 0.0 The airship flew into a thunderstorm and encou... 1913
4 1913-10-17 Near Johannisthal, Germany Military - German Navy Zeppelin L-2 (airship) 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked... 1913
... ... ... ... ... ... ... ... ... ...
5186 2009-05-20 Near Madiun, Indonesia Military - Indonesian Air Force Lockheed C-130 Hercules 112.0 98.0 2.0 While on approach, the military transport cras... 2009
5187 2009-05-26 Near Isiro, DemocratiRepubliCongo Service Air Antonov An-26 4.0 4.0 0.0 The cargo plane crashed while on approach to I... 2009
5188 2009-06-01 AtlantiOcean, 570 miles northeast of Natal, Br... Air France Airbus A330-203 228.0 228.0 0.0 The Airbus went missing over the AtlantiOcean ... 2009
5189 2009-06-07 Near Port Hope Simpson, Newfoundland, Canada Strait Air Britten-Norman BN-2A-27 Islander 1.0 1.0 0.0 The air ambulance crashed into hills while att... 2009
5190 2009-06-08 State of Arunachal Pradesh, India Military - Indian Air Force Antonov An-32 13.0 13.0 0.0 The military transport went missing while en r... 2009

5191 rows × 9 columns

In [13]:
df['Death'] = df['Fatalities'] + df['Ground']

df.head()
Out[13]:
Date Location Operator Type Aboard Fatalities Ground Summary Year Death
0 1908-09-17 Fort Myer, Virginia Military - U.S. Army Wright Flyer III 2.0 1.0 0.0 During a demonstration flight, a U.S. Army fly... 1908 1.0
1 1912-07-12 AtlantiCity, New Jersey Military - U.S. Navy Dirigible 5.0 5.0 0.0 First U.S. dirigible Akron exploded just offsh... 1912 5.0
2 1913-08-06 Victoria, British Columbia, Canada Private Curtiss seaplane 1.0 1.0 0.0 The first fatal airplane accident in Canada oc... 1913 1.0
3 1913-09-09 Over the North Sea Military - German Navy Zeppelin L-1 (airship) 20.0 14.0 0.0 The airship flew into a thunderstorm and encou... 1913 14.0
4 1913-10-17 Near Johannisthal, Germany Military - German Navy Zeppelin L-2 (airship) 30.0 30.0 0.0 Hydrogen gas which was being vented was sucked... 1913 30.0
In [14]:
df['Aboard'] = df['Aboard'].astype(int)
df['Fatalities'] = df['Fatalities'].astype(int)
df['Ground'] = df['Ground'].astype(int)

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5191 entries, 0 to 5190
Data columns (total 10 columns):
 #   Column      Non-Null Count  Dtype         
---  ------      --------------  -----         
 0   Date        5191 non-null   datetime64[ns]
 1   Location    5191 non-null   object        
 2   Operator    5191 non-null   object        
 3   Type        5191 non-null   object        
 4   Aboard      5191 non-null   int32         
 5   Fatalities  5191 non-null   int32         
 6   Ground      5191 non-null   int32         
 7   Summary     4823 non-null   object        
 8   Year        5191 non-null   int32         
 9   Death       5191 non-null   float64       
dtypes: datetime64[ns](1), float64(1), int32(4), object(4)
memory usage: 324.6+ KB

Step 3. Analysis and Visualization¶

In [15]:
# Analisa jumlah kecelakaan tiap tahun
px.histogram(df,x='Year')
19201940196019802000050100150200
Yearcount
plotly-logomark
In [16]:
# Analisa jumlah korban jiwa tiap tahun
fig = px.scatter(df, x='Year', y = 'Death')

fig.add_annotation(
    x="2001",
    y=2815,
    text="Anomali pada data, dikarenakan kejadian 9/11",
    showarrow=True,
    xanchor="right",
)

fig.show()
19201940196019802000050010001500200025003000
YearDeathAnomali pada data, dikarenakan kejadian 9/11
plotly-logomark
In [17]:
px.scatter(df['Year'].value_counts().to_frame().reset_index(),
           x = 'Year',
           y = 'count',
           trendline = 'lowess')
19201940196019802000020406080100
Yearcount
plotly-logomark
In [18]:
px.scatter(df['Operator'].value_counts().to_frame().nlargest(50,'count').reset_index(),
           x = 'Operator',
           y = 'count',
           title = 'Top 50 Maskapai dengan jumlah kecelakaan tertinggi')
Military - U.S. Air ForceAeroflotAir FranceDeutsche LufthansaUnited Air LinesAir TaxiMilitary - U.S. Army Air ForcesPan American World AirwaysMilitary - Royal Air ForceMilitary - U.S. NavyAmerican AirlinesChina National Aviation CorporationUS Aerial Mail ServiceIndian AirlinesPhilippine Air LinesKLM Royal Dutch AirlinesPrivateNorthwest Orient AirlinesEastern Air LinesBritish Overseas AirwaysAeropostaleAVIANCASabenaMilitary - U.S. ArmyMilitary -Royal Air ForceImperial AirwaysGaruda Indonesia AirlinesTrans Continental and Western AirBritish European AirwaysVASPTrans World AirlinesVarigMerpati Nusantara AirlinesCeskoslovenske AerolinieLinea Aeropostal VenezolanaCubana de AviacionMilitary - Indian Air ForcePakistan International AirlinesMilitary - German NavyMilitary - Afghan Republican Air ForceCruzeiroMilitary - Russian Air ForceChina Airlines (Taiwan)Turkish Airlines (THY)Military - U.S. Marine CorpsQantasLOT Polish AirlinesEthiopian AirlinesPanair do BrasilDelta Air Lines050100150
Top 50 Maskapai dengan jumlah kecelakaan tertinggiOperatorcount
plotly-logomark
In [19]:
px.scatter(df['Type'].value_counts().to_frame().nlargest(50,'count').reset_index(),
           x = 'Type',
           y = 'count',
           title = 'Top 50 Tipe Pesawat dengan jumlah kecelakaan tertinggi')
Douglas DC-3de Havilland Canada DHC-6 Twin Otter 300Douglas C-47ADouglas C-47Douglas DC-4Yakovlev YAK-40Antonov AN-26Junkers JU-52/3mDouglas C-47BDe Havilland DH-4Douglas DC-6BBreguet 14Curtiss C-46ADouglas DC-6Antonov AN-24Antonov AN-12Douglas C-47-DLMcDonnell Douglas DC-9-32Curtiss C-46Embraer 110P1 BandeiranteTupolev TU-134AJunkers F-13Fokker F-27 Friendship 600Lockheed C-130Hde Havilland Canada DHC-6 Twin Otter 100Fokker F-27 Friendship 200Cessna 208B Grand CaravanLockheed 18 LodestarBoeing KC-135Ade Havilland Canada DHC-6 Twin Otter 200Lockheed C-130E HerculesCASA 212 Aviocar 200Britten-Norman BN-2A IslanderCurtiss C-46FPiper PA-31-350 NavajoLockheed C-130H Herculesde Havilland DHC-2Piper PA-31Cessna 402BCessna 208B Caravan I Super CargomasterDouglas DC-3CCurtiss C-46DFokker F-27 Friendship 100Piper PA-23Ilyushin IL-18Boeing 40Fokker F-28 Fellowship 1000Douglas C-54APiper PA-32Antonov AN-24B0100200300
Top 50 Tipe Pesawat dengan jumlah kecelakaan tertinggiTypecount
plotly-logomark
In [20]:
px.scatter(df[['Operator','Type']].value_counts().to_frame().reset_index(),
           x = 'Operator',
           y = 'Type',
          height = 800)
US Aerial Mail ServiceREALAir IndiaSouthern AirwaysMilitary - Azerbaijan Air ForceMilitary - United States Air ForceNordeste Linhas AereasNorth PacifiAirlinesMinerva AirlinesMilitary - Força Aérea BrasileiraMilitary - IslamiRevolution's Guards CorpsLíder Táxi AéreoMetro CargoMilitary - Mauritanian Air ForceTrans Guyana AirwaysTransporte Aereo FederalTAME EcuadorTARWenela Air ServicesYute Air Alaska - Air TaxiUni AirUniversal AirwaysPrivate air taxiRubner Flying ServicePhoenix AviationPrivate - Parajump air showSmyer Aircraft - Air TaxiSudan AirwaysSafe Air ComplanyServicios AmericanosLufthansa CitylineAstrd Wing Aviaiton - Air TaxiAir Transport AssociatesAir Taxi - TransamericaAirfast IndonesiaCAAC Air TraffiManagement BureauAvio Linee ItalianeBinter MediterráneoAeronautical Services - Air TaxiAir AlbatrossAero CargoAero Taxis EquatorianosAir MartiniqueAir Taxi - Island AirlinesAir CentralHesler Noble - Air TaxiItaviaGinger Coote AirwaysHarrington's IncCalifornia Air CharterLloyd Aero BolivianoJC AirL'Armée de L'AirCommutairDHL Aviation AfricaChina Eastern AirlinesCenturian Air CargoF.R. de la Direccion General de AdosFranco-RoumaineDan-Air ServicesEAS Airlines (Executive Airline Services)De Havilland DH-4Beechcraft C99Antonov An-24RVCessna UC-35D Citation EncoreDouglas R5D2Convair CV-240-2Zeppelin Dixmunde (airship)PT-LCNDouglas C-47-DKBoeing Vertol Chinook HC-2 (helicopter)Canadair DC-4M-2 NorthstarGeneral Aviation GA-43Curtis C-46A-20-CUCaravelle VIREnstrom F-28FJunkers Ju-52/3mde Havilland Canada DHC-6-300 Twin OtterBoeing 707-328CSikorsky S-58ETBritten-Norman BN-2A-21 TrislanderAntonov AN-24RV / Soviet Air Force TU-16Domier Delphin IIIBoeing B-727-230Vickers 614 Viking 1Boeing B-737-2P6Boeing 747-2B5FBlériot Spad 46Vickers 837 Viscount
OperatorType
plotly-logomark
In [193]:
from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="my_geocoder")
def get_lat_long(location):
    while True:
        try:
            geocode_result = geolocator.geocode(location)
            if geocode_result:
                return geocode_result.latitude, geocode_result.longitude
            else:
                return None, None
        except Exception as e:
            return None, None

df['Latitude'], df['Longitude'] = zip(*df['Location'].apply(get_lat_long))

px.scatter_geo(df,
               lat = 'Latitude', lon = 'Longitude',
               hover_data = ['Location', 'Year','Death','Summary'],
               color = 'Death', 
               title = 'Peta Kecelakaan Pesawat')
05001000150020002500DeathPeta Kecelakaan Pesawat
plotly-logomark

Step 4. Conclustion¶

  1. Korban Jiwa tertinggi dari Kecelakaan Pesawat Terjadi di Tahun 2001, dikarenakan kasus 9/11
  2. Puncak trendline kecelakaan pesawat terjadi pada tahun 1978, dimana setelah itu, jumlah kecelakaan kian menurun.
  3. 2 Maskapai dengan jumlah kecelakaaan paling tinggi (176) adalah US Military Air Force dan Aeroflot
  4. Tipe pesawat yang paling kerap kecelakaan adalah Douglas DC-3